A Hybrid Human-computer Approach to the Extraction of Scientific Facts from the Literature
نویسندگان
چکیده
A wealth of valuable data is locked within the millions of research articles published each year. Reading and extracting pertinent information from those articles has become an unmanageable task for scientists. This problem hinders scientific progress by making it hard to build on results buried in literature. Moreover, these data are loosely structured, encoded in manuscripts of various formats, embedded in different content types, and are, in general, not machine accessible. We present a hybrid human-computer solution for semi-automatically extracting scientific facts from literature. This solution combines an automated discovery, download, and extraction phase with a semi-expert crowd assembled from students to extract specific scientific facts. To evaluate our approach we apply it to a challenging molecular engineering scenario, extraction of a polymer property: the Flory-Huggins interaction parameter. We demonstrate useful contributions to a comprehensive database of polymer properties.
منابع مشابه
Data Extraction using Content-Based Handles
In this paper, we present an approach and a visual tool, called HWrap (Handle Based Wrapper), for creating web wrappers to extract data records from web pages. In our approach, we mainly rely on the visible page content to identify data regions on a web page. In our extraction algorithm, we inspired by the way a human user scans the page content for specific data. In particular, we use text fea...
متن کاملA hybridization of evolutionary fuzzy systems and ant Colony optimization for intrusion detection
A hybrid approach for intrusion detection in computer networks is presented in this paper. The proposed approach combines an evolutionary-based fuzzy system with an Ant Colony Optimization procedure to generate high-quality fuzzy-classification rules. We applied our hybrid learning approach to network security and validated it using the DARPA KDD-Cup99 benchmark data set. The results indicate t...
متن کاملFeature Selection for Small Sample Sets with High Dimensional Data Using Heuristic Hybrid Approach
Feature selection can significantly be decisive when analyzing high dimensional data, especially with a small number of samples. Feature extraction methods do not have decent performance in these conditions. With small sample sets and high dimensional data, exploring a large search space and learning from insufficient samples becomes extremely hard. As a result, neural networks and clustering a...
متن کاملFeature extraction of hyperspectral images using boundary semi-labeled samples and hybrid criterion
Feature extraction is a very important preprocessing step for classification of hyperspectral images. The linear discriminant analysis (LDA) method fails to work in small sample size situations. Moreover, LDA has poor efficiency for non-Gaussian data. LDA is optimized by a global criterion. Thus, it is not sufficiently flexible to cope with the multi-modal distributed data. We propose a new fea...
متن کاملSolving the vehicle routing problem by a hybrid meta-heuristic algorithm
The vehicle routing problem (VRP) is one of the most important combinational optimization problems that has nowadays received much attention because of its real application in industrial and service problems. The VRP involves routing a fleet of vehicles, each of them visiting a set of nodes such that every node is visited by exactly one vehicle only once. So, the objective is to minimize the to...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Procedia computer science
دوره 80 شماره
صفحات -
تاریخ انتشار 2016